This I believe in genetics: discovery can be a nuisance, replication is science, implementation matters
نویسنده
چکیده
For several years now, accumulation of genetic information has accelerated at a pace that exceeds the acceleration of computer capacity (Moore’s law) and there is no discernible limit to prospects of further growth. As the cost per unit of obtained genetic information is plummeting (Niedringhaus et al., 2011), genetics has become a frontrunner and a catalyst of the informatics revolution that is affecting very diverse biomedical scientific fields. In non-biomedical science, somewhat equivalent roles to genetics have been assumed by other disciplines that are also driven by big data, e.g., observational astrophysics and high-energy and particle physics. This evolving big data paradigm offers the opportunity to re-think about priorities surrounding different steps in the scientific process. Making new discoveries has been deemed traditionally the most important aspect of scientific investigation. By “traditionally,” I mean the usual criteria of funding agencies, the publication priorities of major scientific journals, the selection processes for prestigious academic recognitions, even the public imagination and fantasizing on what scientific investigation is all about. According to the most widespread cliché, scientists discover new things by collecting and analyzing more and more data. However, the genomic information explosion has caused an oversupply crisis. This crisis has drastically devalued the currency of discovery. Data are overabundant; most of it can be accumulated without any serious thinking; actually researchers with personal mental labor are not even needed to collect data: commercial chips do the trick, and robots do the pipetting. Not only data are abundant, discoveries are also as abundant. Even if we postulate an 1:1,000,000 ratio of claimed discoveries to data items, there are zillions of discoveries that can now be claimed every day. Based on what we have started to surmise empirically,most of these claimed discoveries are likely to be either totally false preliminary observations (Ioannidis, 2005) or substantially exaggerated results (Ioannidis, 2008), a consequence of the extreme multiplicity of the probed data-space, the winner’s curse (Zollner and Pritchard, 2007), and other biases. “Negative” results have almost disappeared from many scientific fields, especially those with “softer” measurements and more flexible analytical tools (Fanelli, 2010). Results procured by the most popular research sub-fields seem to have the lowest reliability (Pfeiffer and Hoffman, 2009). It seems likely that there is an extraordinary large number of small, weak effects and links (“risks” in epidemiological language), barely discernible from measurement error and diverse potential biases. Single discoveries made in single databases are likely to mean very little, they are mostly a nuisance that propagates confusion in the literature. Exceptions certainly occur, and some strong/large effects may still exist, awaiting discovery. Even then, it is unlikely that the discoverer who hits upon them will have any more merit than the thousands of other researchers who only come across the flooding multitude of weak or false effects. The process of rewarding discoverers claiming large effects (be that with grants, tenure, or Nobel prizes) may eventually become indistinguishable from running a lottery. If we add human nature, biases, and conflicts (Ioannidis, 2011), a lottery system may be even preferable. In settings where claimed discoveries become more than we can absorb and tolerate and when most claims about discoveries are false, replication becomes the most important, central piece of science. Replication efforts typically require a shift toward team science (e.g., consortia; Austin et al., 2012). They place emphasis on a community effort to find the few true among many wrong proposed leads. Replication offers a realistic chance of maintaining the scientific literature reasonably noise-free. Genetics has shown clearly how important this is. Human genome epidemiology was radically transformed in the last decade by the adoption of a rigorous replication culture. While the vast majority of claims for genetic associations based on biological plausibility speculations and performed by single teams without replication were apparently wrong (Ioannidis, 2011), large meta-analyses of genome-wide association studies using agnostic platforms and sine-qua-non, rigorous replication across multiple teams and multiple datasets has yielded thousands of associations with unquestionably high credibility (Hindorff et al., 2009). How many other scientific fields are still conducting studies based on biological plausibility speculations and performed by single teams without replication? Probably most of the literature in diverse fields has been based on these same premises and will likely collapse once rigorous replication practices are adopted. As replication creates an expanding, more reliable basis of knowledge, the need to further translate and implement this knowledge becomes also essential. Until now, research emphasis (and funding) has been placed disproportionately on T0 (discovery research) and some T1 (research for development of new tests or therapies) (Schully et al., 2011), with exponentially
منابع مشابه
I-53: Genetics of Infertility: How to CloneHuman Genes Solely Involved in InfertilityPhenotype
An increased proportion of couples require a medical help to conceive and 1-3.6% of pregnancies in occidental countries are obtained thanks to a Assistance Reproduction For more than half of them the cause of these dysfunctions remains unknown and in vitro fertilization is often proposed as a universal answer to a complex problem. Most of the proposed treatments are often empirical and little h...
متن کاملA Survey of Dynamic Replication Strategies for Improving Response Time in Data Grid Environment
Large-scale data management is a critical problem in a distributed system such as cloud,P2P system, World Wide Web (WWW), and Data Grid. One of the effective solutions is data replicationtechnique, which efficiently reduces the cost of communication and improves the data reliability andresponse time. Various replication methods can be proposed depending on when, where, and howreplicas are gener...
متن کاملIdentification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملFactors Affecting the Implementation of Citizen Science in Iranian Universities
Citizen science is a form of public participation research based on public participation that provides multiple benefits to researchers, citizens, policymakers, and society, as well as research and innovation cycles. Literature review indicates that universities have a significant role in management and implementation of citizen science projects. Despite the emphasis on citizen participation in...
متن کاملIdentification of a Novel Tumor-Binding Peptide for Lung Cancer Through in-vitro Panning
Tumor-targeted therapies are playing growing roles in cancer research. The exploitation of these powerful therapeutic modalities largely depends on the discovery of tumor-targeting ligands. Phage display has proven a promising high throughput screening tool for the identification of novel specific peptides with high binding affinity to cancer cells. In the present study, we describe the use of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2013